News-Oriented Automatic Chinese Keyword Indexing

نویسندگان

  • Sujian Li
  • Houfeng Wang
  • Shiwen Yu
  • Chengsheng Xin
چکیده

In our information era, keywords are very useful to information retrieval, text clustering and so on. News is always a domain attracting a large amount of attention. However, the majority of news articles come without keywords, and indexing them manually costs highly. Aiming at news articles’ characteristics and the resources available, this paper introduces a simple procedure to index keywords based on the scoring system. In the process of indexing, we make use of some relatively mature linguistic techniques and tools to filter those meaningless candidate items. Furthermore, according to the hierarchical relations of content words, keywords are not restricted to extracting from text. These methods have improved our system a lot. At last experimental results are given and analyzed, showing that the quality of extracted keywords are satisfying.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

News-Oriented Keyword Indexing with Maximum Entropy Principle

In our information era, keywords are very useful to information retrieval, text clustering and so on. News is always a domain attracting a large amount of attention. Aiming at news documents' characteristics and the resources available, this paper proposes to use Maximum Entropy (ME) model to conduct automatic keyword indexing. The focus of ME-based keyword indexing is how to obtain all the can...

متن کامل

Construction of Knowledge Base for Automatic Indexing and Classification Based on Chinese Library Classification

Class number, descriptor and keyword are three kinds of subject concept identifiers, among which there exist some concept ual mapping relationships, i.e. compatibility. According to this principle, we construct a CLC Knowledge Base on the basis of Chinese Library Classification for automatic indexing and classification. We compare it with the CLC system to illuminate its obvious advantages over...

متن کامل

Multimodal Indexing of Multilingual News Video

The problems associated with automatic analysis of news telecasts are more severe in a country like India, where there are many national and regional language channels, besides English. In this paper, we present a framework for multimodal analysis of multilingual news telecasts, which can be augmented with tools and techniques for specific news analytics tasks. Further, we focus on a set of tec...

متن کامل

A Chinese Automatic Text Summarization system for mobile devices

A large amount of on-line information and lengthiness information can’t fit for the mobile devices. In order to save this problem, we propose a method which collects original news text from on-line information and extracts summary sentences from them automatically. On this basis, we adopt WML(Wireless Markup Language) to build a news website for mobile devices browsing through the news summary....

متن کامل

Japanese Sentence Analysis For Automatic Indexing

A new method for automatic keyword extracting and "role" setting is proposed based on the Japanese sentence structure analysis. The analysis takes into account the following features of Japanese sentences, i.e., the structure of a sentence is determined by the noun-predicate verb dependency, and the case indicating words (kaku-joshi) play an important role in deep case structure. By utilizing t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003